Introduction to Statistics
Bennett Kleinberg
Week 2
Week 2
- Central tendency
- Variability of data
Note on sampling
![]()
Sampling
- sampling is the process by which \(n\) observations are taken from a population of size \(N\)
- this is one of the most important methods in the behavioural and social sciences
- if the sampling is wrong, the rest is BS
- GIGO principle (garbage in, garbage out)
- more in week 4
- for now: sample = subset of the population
Part 1: Central tendency
- Aim: we want to describe the data
- specifically: we want to express the center of the data distribution
- remember: think of data = distribution
Example data
- We take a sample of \(n=100\) of students at TiU
- And ask: how many hours per week do you spend on YouTube?
- Answers in full hours
Looking at the histogram

Describing central tendency
The MODE:
- simple definition: the score (or category) with the highest frequency
- works for all scales of data (think about nominal data)
Obtaining the mode
We look at the frequency table, and select the most frequently chosen option:
| 10 |
17 |
| 12 |
16 |
| 11 |
15 |
| 8 |
12 |
| 9 |
9 |
The mode is 10 hours.
The location of the mode

The mode and distribution shapes
(demo)
Describing central tendency
The MEAN:
- often called the average
- exact definition: the sum of all scores divided by the number of scores
Statistical notation:
\(\mu=\frac{\sum{X}}{N}\) (population mean)
\(M=\frac{\sum{X}}{n}\) (sample mean)
Calculating the mean
- Sample size: \(n=5\)
- YouTube hours watched data: \(5,7,9,14,6\)
\(\sum{X} = 5+7+9+14+6 = 41\)
\(M=\frac{\sum{X}}{n} = \frac{41}{5} = 8.20\)
Where is it in the distribution?

Mode and mean

Why not always the mean?
Suppose there are 10 friends (a, b, c, … j) in a bar. Each of them says how many hours they spend on YouTube last week.
Here’s their data:
| a |
15 |
| b |
6 |
| c |
2 |
| d |
2 |
| e |
4 |
| f |
12 |
| g |
6 |
| h |
15 |
| i |
3 |
| j |
7 |
…
Now another person enters. This friend, “k”, is a binge watcher. He says that last week he watched 50 hours of YouTube.
What do you think will happen to the mean?
New histogram

Beware of outliers
- Mean before: \(M=\frac{\sum{X}}{n} = \frac{72}{10} = 7.20\)
- Mean with binge-watcher: \(M=\frac{\sum{X}}{n} = \frac{122}{11} = 11.09\)
Extreme values can affect the mean!
The extreme values are often called outliers.
Another illustration
There are a hundred people in a bar. The average (mean) income is 30,000 EUR. Now Jeff Bezos walks in and suddenly everyone is billionaire.
These problems can be addressed:
- mean trimming (not in this course)
- another metric
Describing central tendency
The MEDIAN:
- often called the midpoint
- exact definition: the median splits the distribution in half
Example
The friend data:
| a |
15 |
| b |
6 |
| c |
2 |
| d |
2 |
| e |
4 |
| f |
12 |
| g |
6 |
| h |
15 |
| i |
3 |
| j |
7 |
| k |
50 |
Special cases
Distributions without “clear” midpoint:
- data: \(4,15,13,14,38,3\)
- sorted data: \(3,4,13,14,15,38\)
- median?
In this case, we take the two middle values and obtain the average:
- median = \(\frac{13+14}{2}=13.5\)
Part 2: Variability
- Aim: we want to describe the data
- specifically: we want to express how much the scores in the data differ
- also called the spread of the data (or lack thereof)
New data example
- grades for Intro to Statistics at first attempt for \(N=10\)
| A K |
5 |
| B L |
3 |
| C M |
6 |
| D N |
6 |
| E O |
7 |
| F P |
8 |
| G Q |
6 |
| H R |
9 |
| I S |
8 |
| J T |
10 |
How can we express data variability?
- The easiest way: we take the lowest value and the highest value
- \(\min grade = 3\)
- \(\max grade = 10\)
\(range = \max - \min\)
See also p. 102 in the book.
A bit more nuanced
- maybe we calculate how much each score differs from the (population) mean
- \(\mu = 6.8\)
| A K |
5 |
-1.8 |
| B L |
3 |
-3.8 |
| C M |
6 |
-0.8 |
| D N |
6 |
-0.8 |
| E O |
7 |
0.2 |
| F P |
8 |
1.2 |
| G Q |
6 |
-0.8 |
| H R |
9 |
2.2 |
| I S |
8 |
1.2 |
| J T |
10 |
3.2 |
What is problematic?
This procedure gives us the deviation score (from the mean) for each value
\(deviation = X - \mu\)
- Think about what the mean actually is
- It is - by definition - the balancing point
- Have a look…
Deviation and the mean

Deviations sum to 0

Common trick: Squaring the difference
| A K |
5 |
-1.8 |
3.24 |
| B L |
3 |
-3.8 |
14.44 |
| C M |
6 |
-0.8 |
0.64 |
| D N |
6 |
-0.8 |
0.64 |
| E O |
7 |
0.2 |
0.04 |
| F P |
8 |
1.2 |
1.44 |
| G Q |
6 |
-0.8 |
0.64 |
| H R |
9 |
2.2 |
4.84 |
| I S |
8 |
1.2 |
1.44 |
| J T |
10 |
3.2 |
10.24 |
The \(x^2\) trick
- removes negative values
- “punishes” larger values
- \(2^2 = 4\)
- \(4^2 = 16\)
- Note: differences are also squared
- When we double \(x\), we quadruple \(x^2\)
From deviation to variance
We can obtain a more meaningful measure now.
The mean of squared deviations is called the variance.
\(var = \frac{\sum{(X-\mu)^2}}{N}\)
Stepwise: deviation
\(\mu = 5.4\)
| A K |
5 |
-0.4 |
| B L |
3 |
-2.4 |
| C M |
6 |
0.6 |
| D N |
6 |
0.6 |
| E O |
7 |
1.6 |
Stepwise: squared deviation
| A K |
5 |
-0.4 |
0.16 |
| B L |
3 |
-2.4 |
5.76 |
| C M |
6 |
0.6 |
0.36 |
| D N |
6 |
0.6 |
0.36 |
| E O |
7 |
1.6 |
2.56 |
\(var = \frac{\sum{(X-\mu)^2}}{N} = \frac{9.2}{5} = 1.84\)
Stepwise: the standard deviation
- among the most frequently used statistics for variability
- standard in most research papers
\(SD = \sqrt{var}\)
\(\sigma = \sqrt{\frac{\sum{(X-\mu)^2}}{N}}\)
Here: \(\sigma = \sqrt{\frac{9.2}{5}} = \sqrt{1.84} = 1.36\)
Sum of squares
- an alternative approach is to first go through the sum of squared deviations (SS)
- this: \(\sum{(X-\mu)^2}\)
Then:
\(var = \frac{SS}{N}\)
\(\sigma = \sqrt{\frac{SS}{N}}\)
This is why \(var\) is also noted as \(\sigma^2\)
Remember populations and samples?
Until here: the variability statistics were for the population
The sample is biased (i.e. over- or underestimated):
- here this means it will underestimate the variability of the population
- we can correct for this
- this is where we need the sum of squares
Correcting for bias
We make the value slightly larger, by decreasing the denominator:
\(sample\ variance = \frac{SS}{n-1}\)
\(s = \sqrt{\frac{SS}{n-1}}\)
Compare:
- \(\frac{SS}{N} = \frac{9.2}{5} = 1.84\) vs \(\frac{SS}{n-1} = \frac{9.2}{4} = 2.30\)
- \(\sqrt{\frac{9.2}{5}} = 1.36\) vs \(\sqrt{\frac{9.2}{4}} = 1.52\)
Examples of reporting summary statistics
show that the judgments are closer to the true emotion score in the longer texts (M=1.19, SD=1.88) than in the shorter ones (M=2.00, SD=2.35), Cohen’s d = 0.38 [99% CI: 0.30; 0.45]
Recap
- we can describe the center of the data
- we can also describe how much the data is spread out
- range
- deviation –> variance –> standard deviation
- correcting for sample bias in sample statistics